Multiple dataset training Web Support by BryonLewis · Pull Request #503 · Kitware/dive

BryonLewis · 2020-12-16T16:06:56Z

Fixes #391

NOTE - Need the latest kitware/viame:gpu-algorithms-latest for the input_list to work properly.

Enabled Training menu button when one or more items are selected in a folder
Changes the input type for training to an array of folderIds and updates the API in the relevant locations.
Server now takes in the JSON data for the folderIds and does a preprocess check on each dataset to ensure there is some groundtruth csv files there.
Updated the training to remove the labels.txt and use the new -il input_folder_list.txt and the -it input_groundtruth_list.txt for specifiying the data.
Since it still requires a folder structure for orangization I kept the organize_folder_for_training but removed the labels.txt stuff.
--no-query is added to the groundtruth command so it will use all types that would be in the labels.txt by default and prevent the user from being prompted to accept.

I've tested by taking to small datasets with different track types in it and training on them. Then I would run the trained model on another small dataset and ensure that it is using types from both datasets.

Additionally I trained across different folders by using the /viame/train endpoint and manually specifying folderIds across different root folders and different public users. It trained successfully and the resulting pipeline incorporated types across the different folders.

subdavis · 2020-12-21T18:25:59Z

Since it still requires a folder structure for orangization

Could you explain this part? I didn't expect that anything would need to move. You could just do a simple download of each dataset from Girder, then point to the data in-place without moving anything.

Also, I expect you'd like to merge this before #487. I'm fine with that. Just to confirm, this will work with arbitrary dataset ids right? They don't have to be siblings?

BryonLewis · 2020-12-21T18:55:47Z

Could you explain this part? I didn't expect that anything would need to move. You could just do a simple download of each dataset from Girder, then point to the data in-place without moving anything.

besides my massive spelling mistake there (orangization). Bad choice of words for the explanation. I had the assumption that the testing of the ground_truth to see if it is a directory was in there because of some legacy items where the meta.detection might provide a directory instead of a csv so that test needed to remain in there. Really all organize_folder_for_training does right now is check if the ground_truth is a directory, if it is it will copy the first .csv out of it into the training data directory associated with that groundtruth and then deletes the folder. If it is a file it just renames it to ground_truth.csv which is unnecessary but cleaner. I could keep all ground truth at the root level, but if I'm doing the test and possibly moving it why not keep it a bit more organized. I should probably clean up the description and what it is called.

Also, I expect you'd like to merge this before #487. I'm fine with that. Just to confirm, this will work with arbitrary dataset ids right? They don't have to be siblings?

Yeah that was the second part of my testing, I was training across different user's public folders, just required that I manually create the array of dataset ids and call the endpoint because there is no UI for it currently.

jjnesbitt

Just some minor things but it looks good, haven't tested locally yet

jjnesbitt · 2020-12-21T23:48:35Z

Really all organize_folder_for_training does right now is check if the ground_truth is a directory, if it is it will copy the first .csv out of it into the training data directory associated with that groundtruth and then deletes the folder. If it is a file it just renames it to ground_truth.csv which is unnecessary but cleaner.

For the record, the reason this is done is in case the girder item has multiple files. If it does, it's a folder when downloaded. Otherwise it's just a file. Since we still use the csv_detection_file method to ensure a csv file on the item, this is likely always the case.

Co-authored-by: Jacob Nesbitt <jjnesbitt2@gmail.com>

subdavis

👍

subdavis · 2020-12-22T18:20:07Z

+            detections = list(
+                Item().find({"meta.detection": str(folderId)}).sort([("created", -1)])
+            )
+            detection = detections[0] if detections else None


Maybe refactor viame_detection.py _load_detections() helper function?

Eh, this can be done later.

BryonLewis added 3 commits December 15, 2020 15:41

client side stuff

9fc1494

Merge branch 'master' into multiple-dataset-training

079f6e3

working multi training

4b670cd

BryonLewis changed the base branch from client/training-ui to master December 16, 2020 16:13

BryonLewis linked an issue Dec 16, 2020 that may be closed by this pull request

[FEATURE] Allow training of multiple datasets at once #391

Closed

mend

7e1f6a3

BryonLewis force-pushed the multiple-dataset-training branch from a5c8bef to 7e1f6a3 Compare December 16, 2020 18:51

BryonLewis and others added 2 commits December 17, 2020 13:50

Merge branch 'master' into multiple-dataset-training

c4570a8

Adding in the new folder/groundtruth list

5b17e1a

BryonLewis force-pushed the multiple-dataset-training branch from 0a6567d to 5b17e1a Compare December 21, 2020 17:30

BryonLewis marked this pull request as ready for review December 21, 2020 17:41

BryonLewis requested review from jjnesbitt and subdavis December 21, 2020 17:48

jjnesbitt reviewed Dec 21, 2020

View reviewed changes

Comment thread server/viame_tasks/tasks.py Outdated

Comment thread server/viame_tasks/tasks.py Outdated

Comment thread server/viame_tasks/tasks.py Outdated

Comment thread server/viame_tasks/tasks.py Outdated

BryonLewis and others added 4 commits December 21, 2020 19:05

Update server/viame_tasks/tasks.py

c94b7d9

Co-authored-by: Jacob Nesbitt <jjnesbitt2@gmail.com>

Update server/viame_tasks/tasks.py

ce92ba4

Co-authored-by: Jacob Nesbitt <jjnesbitt2@gmail.com>

Update server/viame_tasks/tasks.py

bcada29

Co-authored-by: Jacob Nesbitt <jjnesbitt2@gmail.com>

Fixing comments and paranethesis

cbd9e99

subdavis approved these changes Dec 22, 2020

View reviewed changes

BryonLewis merged commit 20826a0 into master Dec 22, 2020

subdavis deleted the multiple-dataset-training branch December 22, 2020 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple dataset training Web Support#503

Multiple dataset training Web Support#503
BryonLewis merged 10 commits into
masterfrom
multiple-dataset-training

BryonLewis commented Dec 16, 2020 •

edited

Loading

Uh oh!

subdavis commented Dec 21, 2020

Uh oh!

BryonLewis commented Dec 21, 2020

Uh oh!

jjnesbitt left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jjnesbitt commented Dec 21, 2020

Uh oh!

subdavis left a comment

Uh oh!

subdavis Dec 22, 2020

Uh oh!

subdavis Dec 22, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

BryonLewis commented Dec 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

subdavis commented Dec 21, 2020

Uh oh!

BryonLewis commented Dec 21, 2020

Uh oh!

jjnesbitt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jjnesbitt commented Dec 21, 2020

Uh oh!

subdavis left a comment

Choose a reason for hiding this comment

Uh oh!

subdavis Dec 22, 2020

Choose a reason for hiding this comment

Uh oh!

subdavis Dec 22, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BryonLewis commented Dec 16, 2020 •

edited

Loading